Stochastic pronunciation modeling by ergodic-HMM of acoustic sub-word units

نویسندگان

V. Ramasubramanian

P. Srinivas

Thippur V. Sreenivas

چکیده

We propose a stochastic pronunciation model using an ergodic hidden Markov model (EHMM) of automatically derived acoustic sub-word units (SWU). The proposed EHMM discovers the pronunciation structure inherent in the acoustic training data of a word without any apriori phonetic transcriptions. The EHMM is an HMM of HMMs – its states are SWU HMMs and the state-transitions compose various possible lexicon. The EHMM parameters are estimated by an iterative segmental -means procedure which jointly optimizes the subword units (states) and the pronunciation structure parameters (state-transitions). The EHMM based pronunciation model is evaluated in an English isolated word recognition task with 70 speakers drawn from 8 highly different first languages. Results show that EHMM learns the lexicon distribution over the population of speakers for each word, thereby effectively modeling the inter-speaker pronunciation variability. EHMM offers an improvement of 8% (absolute) word recognition accuracy over a single most likely lexicon performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Networks

Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation...

متن کامل

Which units for acoustic and language modeling for Khmer automatic speech recognition?

In this paper we present an overview on the development of a large vocabulary continuous speech recognition system for Khmer language. Methods and tools used for quick language resources collection for the development of an ASR system for a new under-resourced language are presented. Face with the problem of lack of text data and the word error segmentation in language modeling, we investigate ...

متن کامل

Pronunciation Modeling for Large Vocabulary Speech Recognition by Arthur

The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy for automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the units used to define the pronunciations of words. Others model the pronunciati...

متن کامل

A Nonparametric Bayesian Approach to Acoustic Model Discovery

We investigate the problem of acoustic modeling in which prior language-specific knowledge and transcribed data are unavailable. We present an unsupervised model that simultaneously segments the speech, discovers a proper set of sub-word units (e.g., phones) and learns a Hidden Markov Model (HMM) for each induced acoustic unit. Our approach is formulated as a Dirichlet process mixture model in ...

متن کامل

A comparison of grapheme and phoneme-based units for Spanish spoken term detection

The ever-increasing volume of audio data available online through the world wide web means that automatic methods for indexing and search are becoming essential. Hidden Markov model (HMM) keyword spotting and lattice search techniques are the two most common approaches used by such systems. In keyword spotting, models or templates are defined for each search term prior to accessing the speech a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Stochastic pronunciation modeling by ergodic-HMM of acoustic sub-word units

نویسندگان

چکیده

منابع مشابه

Automatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Networks

Which units for acoustic and language modeling for Khmer automatic speech recognition?

Pronunciation Modeling for Large Vocabulary Speech Recognition by Arthur

A Nonparametric Bayesian Approach to Acoustic Model Discovery

A comparison of grapheme and phoneme-based units for Spanish spoken term detection

عنوان ژورنال:

اشتراک گذاری